我们介绍了第一个机器学习引力波搜索模拟数据挑战(MLGWSC-1)的结果。在这一挑战中,参与的小组必须从二进制黑洞合并中识别出复杂性和持续时间逐渐嵌入在逐渐更现实的噪声中的引力波信号。 4个提供的数据集中的决赛包含O3A观察的真实噪声,并发出了20秒的持续时间,其中包含进动效应和高阶模式。我们介绍了在提交前从参与者未知的1个月的测试数据中得出的6个输入算法的平均灵敏度距离和运行时。其中4个是机器学习算法。我们发现,最好的基于机器学习的算法能够以每月1个的错误警报率(FAR)的速度(FAR)实现基于匹配过滤的生产分析的敏感距离的95%。相反,对于真实的噪音,领先的机器学习搜索获得了70%。为了更高的范围,敏感距离缩小的差异缩小到某些数据集上选择机器学习提交的范围$ \ geq 200 $以优于传统搜索算法的程度。我们的结果表明,当前的机器学习搜索算法可能已经在有限的参数区域中对某些生产设置有用。为了改善最新的技术,机器学习算法需要降低他们能够检测信号并将其有效性扩展到参数空间区域的虚假警报率,在这些区域中,建模的搜索在计算上很昂贵。根据我们的发现,我们汇编了我们认为,将机器学习搜索提升到重力波信号检测中的宝贵工具,我们认为这是最重要的研究领域。
translated by 谷歌翻译
基于空间的重力波(GW)检测器将能够观察到来自当前基于地面检测的来源几乎不可能的信号。因此,建立的信号检测方法(匹配的过滤)将需要一个复杂的模板库,从而导致计算成本在实践中过于昂贵。在这里,我们为所有空间GW来源开发了高准确的GW信号检测和提取方法。作为概念的证明,我们表明,科学驱动和统一的多阶段深神经网络可以识别出浸入高斯噪声中的合成信号。与目标信号相比,我们的方法具有超过99%的信号检测准确性,同时获得至少95%的相似性。我们进一步证明了几种扩展场景的解释性和强烈的概括行为。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译
Deep learning-based methods have achieved significant performance for image defogging. However, existing methods are mainly developed for land scenes and perform poorly when dealing with overwater foggy images, since overwater scenes typically contain large expanses of sky and water. In this work, we propose a Prior map Guided CycleGAN (PG-CycleGAN) for defogging of images with overwater scenes. To promote the recovery of the objects on water in the image, two loss functions are exploited for the network where a prior map is designed to invert the dark channel and the min-max normalization is used to suppress the sky and emphasize objects. However, due to the unpaired training set, the network may learn an under-constrained domain mapping from foggy to fog-free image, leading to artifacts and loss of details. Thus, we propose an intuitive Upscaling Inception Module (UIM) and a Long-range Residual Coarse-to-fine framework (LRC) to mitigate this issue. Extensive experiments on qualitative and quantitative comparisons demonstrate that the proposed method outperforms the state-of-the-art supervised, semi-supervised, and unsupervised defogging approaches.
translated by 谷歌翻译
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
translated by 谷歌翻译
Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.
translated by 谷歌翻译
We address the theoretical and practical problems related to the trajectory generation and tracking control of tail-sitter UAVs. Theoretically, we focus on the differential flatness property with full exploitation of actual UAV aerodynamic models, which lays a foundation for generating dynamically feasible trajectory and achieving high-performance tracking control. We have found that a tail-sitter is differentially flat with accurate aerodynamic models within the entire flight envelope, by specifying coordinate flight condition and choosing the vehicle position as the flat output. This fundamental property allows us to fully exploit the high-fidelity aerodynamic models in the trajectory planning and tracking control to achieve accurate tail-sitter flights. Particularly, an optimization-based trajectory planner for tail-sitters is proposed to design high-quality, smooth trajectories with consideration of kinodynamic constraints, singularity-free constraints and actuator saturation. The planned trajectory of flat output is transformed to state trajectory in real-time with consideration of wind in environments. To track the state trajectory, a global, singularity-free, and minimally-parameterized on-manifold MPC is developed, which fully leverages the accurate aerodynamic model to achieve high-accuracy trajectory tracking within the whole flight envelope. The effectiveness of the proposed framework is demonstrated through extensive real-world experiments in both indoor and outdoor field tests, including agile SE(3) flight through consecutive narrow windows requiring specific attitude and with speed up to 10m/s, typical tail-sitter maneuvers (transition, level flight and loiter) with speed up to 20m/s, and extremely aggressive aerobatic maneuvers (Wingover, Loop, Vertical Eight and Cuban Eight) with acceleration up to 2.5g.
translated by 谷歌翻译
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
translated by 谷歌翻译